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Abstract 

There is no proof yet of convergence of Genetic Algorithms. We do not supply it too. Instead, 
we present some thoughts and arguments to convince the Reader, that Genetic Algorithms are 
essentially bound for success. For this purpose, we consider only the crossover operators, single- 
or multiple-point, together with selection procedure. 

We also give a proof that the soft selection is superior to other selection schemes. 
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I. Introduction 

We are using Genetic Algorithms (GA’s) for solving hard global optimization problems 
for at least three reasons: 

• they are easy to implement in many computer languages, 

• they are applicable to problems, which cannot be easily, if at all, specified analytically 
as a set of closed-form formulas, 

• we believe, that their inherent intelligence will automatically, i.e. with almost no 
programmer’s effort, find the way to solve, ’’sufficiently well”, our difficult problems. 

The very idea of GA’s, to simply mimic the Nature [1], belongs mostly to the sphere 
of intuition, and is almost lacking a solid mathematical background. Indeed, numerical 
values of many important ’’tuning parameters” (mutation rate, probability of selection for 
reproduction, etc.) are largely selected on the base of experience of other people solving 
problems similar to ours. The hypothesis of ’’building blocks” appeared false. Other 
investigations of GA’s and their inner working are rare. We simply believe, that following 
the Nature’s paths cannot be wrong. But are we right? And, if so, why? 

II. Distance between parents and offsprings 

It is easy to see, that after the crossover operation, the distance between resulting 
offsprings is identical to the distance between their parents [2], Consider a pair of parents, 
Pa = (Pa>Pa> ■ ■ ■ Pa) an d Pb = (p\:Pbi ■ ■ ■ Pb) consisting of N genes each. The distance 
between them may be calculated in many ways, depending on metrics in use. In the 
simplest case, when each gene is just a binary digit, the Hamming distance (ofa (•, •)) * s 
perhaps the most natural choice. This simply counts the number of bits differing on the 
corresponding positions in the two given bit-strings. It is obvious, that in this case 

dn(Pa,Pb) = dn(o a ,o b ) (1) 

since the parents, p a and pb , differ on exactly the same positions as their offsprings, o a 
and Ob, do — regardless of how many crossover points were used. 


When individual genes are more complex, i.e. when they consist of more bits (or, more 
often, are the symbols drawn from finite size alphabet(s)), or even when they are just the 
real numbers, the same remains true in any metrics induced by L p norms. Indeed, the 
expression: 


dp(p a ,Pb ) = \\Pa-PbL ■ = 


' N 

£ \p k a - rf 


k =1 


P= 1,2,... 


( 2 ) 


has to be equal to d p (o a , o&), as the numerical components of the sum, shown above, are 
identical in both cases; even their order is preserved. The property (1) holds also for less 
frequently used norm L^ : ||x||^ = max*, |x*,|. 

Consider now a triangle in genetic space defined by vortices: two parent chromosomes, 
p a and pb and any other fixed, but otherwise arbitrary, reference point r. We will apply 
the crossover operator to the pair (p a ,Pb), obtaining another pair of chromosomes (o a , o&), 
as shown in Fig. 1. 




parent p a 
parent pb 

arbitrary reference chromosome r 

offspring o a 
offspring Ob 


Fig. 1. Two parent chromosomes, p a and pb, arbitrary reference chromosome, and the pair of offsprings, 
o a and Ob- The integers, a\, a 2 , &i and b 2 , denote numbers of genes (bits), which are different in the 
respective parts of a given chromosome and of the reference point in genetic space — chromosome r. 


It is easy to verify (think of Hamming distance between chromosomes consisting of 1-bit 
genes), that 

{ dn(p a ,r) = ai + a 2 

dp (Pb , r) = bi + b 2 /gx 

dn ( o a , r) = «i + b 2 
du(o b ,r ) = b l +a 2 

and, after adding together two first rows of the above and comparing the result with the 
sum of the two last rows of Eq. 3, that 

du (Pa, r) + d H ( Pb, r ) = d H (o a , r ) + d H (o b , r) = cq + a 2 + b 1 + b 2 (4) 

In remaining cases, with discrete or continuous genes, other measures of distance between 
them may be used. Looking again at Fig. 1, we can conclude, that in general the following 
equality takes place: 


d p p ( Pa, r ) + d p p (r, p b ) = d p p (o a , r ) + d p (r, o b ) (5) 

where p is positive and finite 1 integer, and d p is a distance induced by L p norm. 

The relation (5) may be extended even further, just for elegance, by adding to the 
left-hand side the p-th power of the distance between parents and — to the r.h.s — p-th 

1 For p = oo the relation (5) is usually false. Take 2-genes chromosomes: p a = (1, 5), pb = (2, 0), and r = (0, 0). 
Then o a = (1,0), o b = (2, 5), but d ( p a ,r ) + d ( p b ,r) = 7 6 = d ( o a ,r) + d ( o b ,r ). 









power of the distance between offsprings, since, by virtue of (1), they are equal to each 
other. Calling the sum of lengths of the triangle’s edges, first raised to the fixed integer 
power p, the generalized circumference, we may express our result shortly as: 

The generalized circumference of the triangle made of three chromosomes, remains 
unchanged when any two of them are replaced by their offsprings. 

III. ’’Geometric” conclusion and discussion 

Recall that the chromosome r was chosen arbitrarily. One may wonder what happens, 
if r = x *, i.e. when r is the searched, still unknown, optimal chromosome — possibly one 
of many — in a genetic space. If this is the case, then our finding may be expressed as 
follows: 

Since the sum of p-th powers of distances between parent chromosomes and the 
desired solution is conserved by the crossover operator, then the offspring chro¬ 
mosomes cannot relocate too far from the optimal solution. 

Proof: 

Let the parents’ distances from the solution be equal to d\ and d 2 , both strictly positive, 
and the offsprings’ distances - d a and db, respectively. We can write: 

d\ + d 1 ^ = d v a + df, p = 1,2,... < 00 


or, equivalently 


db 


1 + 




p = 1,2,... < 00 


Depending on relation between d a and d\ the value of the expression appearing in the 
square bracket may be lower than 1, higher than 1, or exactly equal to one. Respectively, 
we have 

d a < di => db > d 2 
d a > di db < d 2 

d a = d\ db = d- 2 (’’rigid movement”) 

In words: if one of the offsprings moves further apart from the solution than one of its 
parents, then the second one gets closer to the solution then their other parent. □ 

Comment In the degenerate case, when exactly one of the parents is already an optimal 
solution, it may happen that both offspring chromosomes will be closer to the solution than 
the second parent, with none of them being the optimal point, which quite unexpectedly 
appears to be a repcller, rather than an attractor! 

In conclusion: the outcome of the crossover operation may vary. One thing, however, 
is sure. It may never happen than both offspring chromosomes are located more far 
away from the solution than their more distant parent. On the other hand, there is no 
guarantee, that at least one of them gets closer to the desired solution, than its ’’better” 
parent. Nevertheless, at least one of newly created individuals is equally or less 
distant from the solution than its ’’worse” parent. We use quotation marks, since 
in reality the chromosomes located closer to the optimal one (the ’’better”) need not to 
be better fitted. This is most easily seen in cases, when genotypes arbitrarily close to the 
solution are unacceptable at all, for example due to violation of constraints. So the really 
important question is: Which one of the two offsprings is closer to the solution? 
Judgment based on their fitness alone cannot be regarded as reliable or conclusive. 



TABLE I 

Possible outcomes of the crossover operation. The positions of symbols (p - parent, o - offspring) are 
meaningful: the more to the right - the bigger is the distance from the desired solution. Omitted are 
the cases, when distances either remain unchanged or coincide. 


No. 

configuration 

comments 

1 

oopp 

impossible 

2 

opop 


3 

oppo 


4 

poop 


5 

popo 


6 

ppoo 

impossible 


We shortly summarize all interesting outcomes of the crossover operation in Tab. I. 
Analyzing its contents we see, that the symbol ”o” can be found at advantageous position 
exactly 6 times, while only twice on disadvantageous one. Does it mean, that the odds 
for selecting ’’proper” offsprings, i.e. to improve at least one trial solution, are 6 : 2? 
The answer would be positive, if the events 2—5 occurred with equal probability, what is 
unlikely. On the other hand, if only the case 5 (the worst) occurs again and again, then 
the random, unbiased selection of one of the offsprings, would give us exactly (!) 50% 
chance to move closer to the reference chromosome r. This means, that in practice, the 
chances for improvement can be even higher than A; we will prove that, rigorously, in the 
following section. 

Important note: We have to carefully distinguish between continuous and discrete case. 
In discrete genetic space the only convergent sequences are constant sequences. This is 
because there are no elements of discrete genetic space, which would be located arbitrarily 
close to any existing chromosome, the optimal one in particular. Therefore the notion of 
convergence is sensible and usable only in continuous cases. 

On the other hand, since r is arbitrary, then it may have nothing to do with the optimal 
solution. That is why the entire evolutionary process may not converge at all without 
additional driving forces, other than the actions of crossover operators. 

As we will show now, the key to the question of convergence is the selection process - the 
practical realization of the Darwinian rule of evolution, survival of the fittests , understood 
in a probabilistic sense rather than an absolute rule. 

IV. Chances for success 

The following text is based on the problem stated and solved by Latala in Delta [3] 
- a popular Polish monthly on mathematics, physics and astronomy, targeted mainly at 
high-school students. The problem and its solution are freely rephrased by the current 
author. 

Problem: 

Find the winning strategy in the following game: 

Looking at an integer number, randomly chosen from two such numbers written 
down by our opponent, guess whether the other (unknown) number is higher or 
lower. The two numbers in question are distinct. We win, when our guess is 
correct, otherwise we loose. 



Solution: 

Use arbitrary, strictly increasing, sequence of numbers {cfc}^_ 00 , each belonging 
to the (open) interval (0,1), for example Ck = \ + arc ^ ank . When the selected 
(known) number is equal to k, then with probability Ck ”,guess’that the other 
(unknown) number is lower, or, with probability 1 — Ck, that it is higher. 

It is obvious, that this strategy should work equally well not only for unknown integer 
numbers, but also when the numbers are drawn from any countable subset of reals. But 
why does it work at all? 

Let prn.n denotes the probability, that the numbers chosen by our opponent are m and 
n, and that m > n. The probability, that our guess is correct, may be written as 


'y ) Pm,n 
m>n 


1 1 /i 

2 Cm + 2 ( 


1 1 

— 2 2 ^ Pm,n 

m>n 
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( 6 ) 


where Z is the set of integers. First in the r.h.s. of (6) comes from the fact, that 
J2Pm,n = 1- The second component is strictly positive, since c m > c n for arbitrary 
m > n (as the sequence {cy.} is strictly increasing) and at least one of p m ,n is greater 
than zero. In conclusion: our chances to win always exceed 50%. This wouldn’t be so, if 
the sequence {c k } was not strictly increasing - in such circumstances our chances to win 
could be estimated only as not less than |. Let us note, that nothing certain can be said 
about how much our chances to win exceed 50%. They will peak, if all the differences 
c m ~ c n are maximized, at least those of them, which ’’meet” non-zero p m ,n s in formula 
(6). Unfortunately, we know nothing ahead about probabilities p m>n ’s. 

How is the above problem related to Genetic Algorithms? Quite simply: the sequence 
{c k } should be regarded as a tool to convert the value of fitness to probability of selection. 
The superiority of the soft selection, realized with such a sequence {cfc}, over the hard 
selection schemes, is evident. In the case of soft selection, our chances to win (i.e. to 
improve the objective by selecting a better offspring for further processing) are always 
higher than chances for failure. On the contrary, the hard selection 2 implies that, in the 
unlucky event, both chances can be equal to each other. 

The hard selection scheme can be considerably improved to become comparable with 
the soft selection. It is enough to select the number no (see footnote) as any average 
of fitnesses of all individuals present in the previous generation. This trick should work 
best in cases, when our opponent - the objective function - produces only a few discrete 
values. 

The difference seems rather subtle: sharp versus not sharp inequality. But let us recall 
the brutal practice of citizens of an ancient Greek city of Sparta. In strive to have only 
excellent warriors as their descendants, they used to physically eliminate all ’’defective” 
newborns. Did they succeed? 


V. More on selection 

Consider the objective function with many local extrema of very similar fitness value, 
yet having exactly one global optimum. The evolving population will sooner or later split 
into many loosely connected clusters, concentrated around those extrema. To discover 
the true, global optimum, we need the ability to correctly rank the individuals with 

2 i.e. c n = 0 for n < no, and c n = 1 otherwise. If so, then c m — c n in r.h.s of (6) is necessarily equal to zero for 
many pairs ( m,n ), m > n. For those pairs ( m,n ), for which c m — c n = 1, in turn, may be equal to zero - 
corresponding to the pairs never produced by our malicious (smart?) opponent. 



very close values of their fitness. Only then the ” useless” individuals, located around 
local extrema, would be extinct. Therefore, in particular computer implementation, not 
every kind of average used as threshold for hard, stepwise selection, is equally good. To 
increase our chances for success, and accelerate the convergence as well, we should apply 
the sequence {c k }, or its continuous counterpart — which may be selected individually for 
each new generation — in such a way, that it changes most significantly around majority 
of fitness values across the population. When searching for maximum, the following simple 
and numerically plausible transformations from fitness to probability of selection , often 
called scaling of the fitness function, satisfy this requirement: 


1 1 2(k- /1/; 

Ck = - + -arc tan —f --— 

2 7T / 3/4 — J 1/4 


or 


,11, 2 ( k ~ f 1/2) 
C k = o + o tanh ■ 


/3/4 — /l/4 


( 7 ) 


with the first choice being definitely softer. The subscripted constants f a denote respective 
quantiles (more precisely: quartiles) of the fitness distribution across the current popula¬ 
tion, with fi /2 being the median. Put unity 3 into the denominator when / 3 / 4 — f \/ 4 = 0. 
Replace summation in (7) with subtraction, when searching for minimum. 

ft is clear, that GA’s can be most effective for objectives, for which only a very limited 
information is available, namely nothing but fitness values computed for every member of 
the population, usually only the last one. Their ability to quickly detect and then to con¬ 
centrate in the interesting parts of the search space makes them clearly superior to generic 
Monte Carlo approach, which waste time for uniform and fruitless exploration of other 
regions. The above is certainly true for objectives, which are at least piecewise continuous 
and have no singularities. For such a broad class of problems, with chromosomes coded 
in a natural way 4 , the stopping rules should be based on compactness of the evolving 
population, paying only little attention to the behavior of fitness. The evolutionary pro¬ 
cess should be continued as long as the volume of the search space occupied by ’’better 
half” of the population still decreases. One should be aware, however, that this strategy 
will fail for objectives with more than one global optimum or when the unique extremum 
is degenerate (flat, improper), i.e. consists of more than a single point, either in reality 
or due to roundings. If this is the case, then careful analysis of the last generation may 
be helpful. 

For discrete problems (with integer and maybe boolean variables present) the notion of 
continuity does not apply, so the task is to efficiently find the acceptable solution without 
performing exhaustive search. It can be shown [4], that for purely discrete problems we 
need O (iV§ In evaluations of the objective instead of 2 N , as necessary and required by 
the exhaustive search. N is the number of bits, not unknowns, in a single chromosome. 
To be precise: after N 2 In N evaluations of the objective, the chance that the best so 
far chromosome is separated no more than 1 unit of distance (in Hamming sense) from 
the optimal one, are higher than 50%. No well justified stopping rules can be given for 
discrete case. 

Mixed problems, involving real and integer unknowns, are even harder to analyze. 
From the formal point of view, such problems may be regarded as large, but finite, sets 
of continuous problems. 

3 The numbers appearing in both denominators need not to be computed very precisely. Our choice is dictated 
by purely numerical reasons: neither the poorest individuals are neglected, nor the best fitted ones have the 
guarantee to be selected. 

4 By natural coding we mean such a mapping of continuous unknowns to genes representing them, which is 
strictly monotonous, and therefore invertible. 



VI. Summary 


We have shown, that Genetic Algorithms are bound for success. The chances for im¬ 
provement are always higher than for lack of it, if the selection of parents is performed 
either in a soft, or hard but adaptive, manner. This is a very general result, completely 
independent on the optimization problem under study. It applies equally well to discrete, 
continuous and mixed optimization problems. 

As we can see, the quite high chances of Genetic Algorithms for success are strictly 
related to their property of not rejecting nor ignoring the ’’bad” trial points in the search 
space. Contrary, the rigorous, deterministic search methods are simply unable to ’’jump 
over” the barrier surrounding even the single global minimum, if started too far from the 
solution. 

Our result is of stochastic nature rather than deterministic. This may mean in practice, 
that we may be unable at all to improve the already known, approximate solution of our 
particular problem. Nothing can be said how quickly we will arrive at any improvement. 
This may be significantly influenced by other components of GA’s: mutation operators, 
population size and numerical values of various tuning parameters, not on the selection 
scheme or crossover mechanism alone. 
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